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ABSTRACT 


We obtain two dimensional Galois treinsform representation of different image sizes 
Calculation of Galois transform coefficients heis been earned out by using FFT, systolic 
array euid Horner's rule Four T414 Transputers and five T800 Tremsputers have been used 
m Imear eind mesh configurations for this purpose Various algorithms are evaluated by 
compeiring sequential, parallel one node and pcirallel 8 node timings for different image 
sizes As a concrete application, FFT based configuration with mesh connected processors 
IS used to calculate Gailois coefficients for IBM font of English alphabet and Arabic 
numereds The coefficients are used to identify these characters by comparing them with 
coefficients stored in computer memory Two algorithms have been implemented for this 
purpose Their performance is compared to that of a straight sample domain algorithm in 
terms of the average number of comparisons reqmred for identification in each case 
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CHAPTER ONE 


INTRODUCTION 


1 1 MOTIVATION 

Man has been living on eaith since time— immemorial He had to fight for his 
survival In order to survive, he needed to identify his enemies and friends quickly 
Therefore he has developed the ability to identify and distinguish between things 
instantaineously However, his survival was not dependent on his ability to deal with large 
numbers and therefore his ability m that field is limited Given two 100 digit numbers, 
mem would take hours to multiply them whereas a simple microprocessor can do it in micro 
seconds On the other hand, a microprocessor/computer has a very lirmted ability to 
identify things 

With eidvances in science and technology, automation has replaced manual 
effort Computers encompass a wide reinge of subjects including image— processing The 
problems m the field of image— processing which draw our attention are 

(1) Csin computers be used for identification purposes'^ 

(2) Can a method/methods be devised to be used for differentiating 
between two images which are almost similar'^ 

(3) Is the above method dependent upon the type of image"? 

(4) Can this method operate in real time conditions'? 

(5) What eire the applications in which this method can be used"? 

(6) Whether the method is commercially viable"? 



1 2 SCOPE OF THE WORK 


In this thesis we will try to taickle some of the the above questions Further, 
we will also try to devise some methods for identifying the English langueige alphabet and 
Arabic numerals As the number of methods that can be devised is very large, we have 
restricted our attention to the methods which use Gedois coefficients 

The method used by the human brain for identifying images is based on the 
pattern matching of em image with images stored in its memory This process appears so 
natural (i e without any physical labor for human being) that we do not comprehend the 
vast neural network required for such qmck matching Some advances have been made m 
fabricating large neural networks in evchieving pattern matching but the huge storage 
requirements inhibit the use of this technology in the present day conditions In this thesis 
we will try to devise a process that can achieve the purpose in real time conditions 
Methods to achieve the above purpose on the computer should bear in mind, that it should 
be based on some intrinsic property/transform of image which is computer friendly 

Here computer friendly property /transform of a image means that the 
method using property/transform is specl^dly smted for computer in the same way as 
pattern matching is suited for human brain As computers are basically large number 
crunchers, hence the method developed can edford to be calculation mtensive 

One of the property which is intrinsic to a image is the Galois transform of 
the imeige (as it is a one to one transform) We have to see whether Galois transform of 
image can be used for identifying the image Smce Gsdois tramsform of an image is 
computation intensive, we will be using parallel processing for its real time 
implementation We will then use the above transform for representation and identification 
of English language alphabets emd Arabic numerals 


1 3 ORGANIZATION OF THE WORK 


Chapter 2 gives necessary eilgebraic background for calculation of Galois 
transform in two variable case It also discusses the possible conjugacy constraints and 
conjugacy classes 

In Chapter 3, we discuss the transputer and Occam facility that we have used 
for developing parallel algorithms We also discuss the hardware setup of the transputer 
network system and General algorithm design considerations 

In Chapter 4, we develop the algorithms for calculation of coefficients They 
are based on fft(lmear zirreingement of processors), fft (processors in mesh arrangement), 
Horner's rule and systolic array 

In Chapter 5, we discuss the algebraic properties of Galois coefficients for 8x8 
binary images of alphabet Also, the three methods that have been developed for 
identification are discussed and compared We also evaluate the advantages and drawbeicks 
of each method over one another 

In chapter 6, we compzire the sequential, parallel— one— node and 
pareillel— eight— node tinungs in the calculation of coefficients for different algorithms We 
concludes it with the discussion on the scope for future work 


CHAPTER 2 


THEORY OF GALOIS SWITCHING FUNCTIONS 

A combinatorial network with k inputs and n outputs may be represented by 
a set of n switching functions of k variables over GF(2) By employing finite fields, it is 
possible to represent a set of n functions of k variables by a single variable polynomial over 
an appropriate extension held of GF(2) The functions described by these polynomials 
which essentially represent mappmg from GF(2^) to GF(2^) are called 1 dimensional 
Galois switching functions The coefficients of this polynomicd (termed as Galois 
polynomial) are called Galois coefficients These polynomials have a well defined algebraic 
structure and possess remarkable properties based on frobemus cycles The concept of 1— D 
GSFs described by single variable may be extended to multi dimensional GSFs, especieilly 
two dimensional GSFs described by two variable GPs The 2— D GSFs are particularly 
useful for representation and processing of images (pictorial data) represented in the form 
of two variable GPs We will first confine our attention to 1— D GSFs and then study in 
detail the properties of 2— D GSFs 

2 1 1-D GALOIS SWITCHING FUNCTION (GSF) [5] 

Any non— zero element of a fimte field can be represented by a power 
(poleir form) of a primitive element a and at the same time can be represented by a 
polynomial of a 

c + C CeGF(2) 

K-1 k“2 1 0 1 

if the mimmal polynomial over GF(2) of a is given Thus we can take the set of coefficients 
Clt“l.Ck~2, ,(o as cartesian representation of fimte field elements For example let a be 



the primitive element of GF(23) and x^-t-x^-fl be the minima] polynomieJ over GF(2) of a 
then truth table can be written as 

polar ceirtesian 
form form 
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X2 Xi Xo 


0 0 0 0 

1 0 0 1 

a 0 10 

a2 10 0 

10 1 


111 

a5 oil 

a® 110 


A polynomial expression for a 1— D GSF is to be obtained, in which the vedue of the 
polynomial f(x) at x = aJ is equal to the value of function at index aJ This f(x) can be 
written as 



2^-2 k 


. s 

f(x)= + a^x2 -J + 

S a ,x2 
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(211) 
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c = 2^-2 

and the function veilue vector 
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,f^) where 

< = 21^2 

can be related through nonsingular 2^^ x 

2^ matrices given below 
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One more mterestmg property of Galois coefficients can be seen by observing the 
above matrix It can be broken into four parts 


'1 

00 

.i 

H 


This H matrix is a DFT matrix of order 2 —1 over an appropriate extension field of 
GF(2), therefore coefficients and DFT of function are related The relations are given below 
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2 2 TWO DIMENSIONAL GSFs 

2 2 1 ALGEBRAIC RELATIONS [5] 

Any 2— D GSF f(xl,x2) can be represented by a 2 variable GP 
f(xl,x2) = E S a X X (2 2 11) 

where ji =— to, 0,1, 2, , 2 ^—2 and 

j2 =-03,0,1,2, , 2 ^—2 

The coefficients are given by 

= (2212 a) 


a = S xj f(x a~^) (2212b) 

Jj-oo 1,2 


a = S X 2 f(a”°°, X ) (2212c) 

-ajj *^2 * ^ 


where 


a = SS f(x ,x ) 


(2 2 1 2 d) 


ji = —03,0,1,2, , 2 ^—2 and 

j 2 = — cd , 0 , 1 , 2 , , 2*^—2 

a 18 a primitive element of GF(2 '), coefficients (a's)eGF(2 ), L being the 

L C M of n, k and k 
1 2 

k k 

If we arrange the function vailues f(x^,x^) in the form of a 2 ^ x 2 ^ matrix, then the 
coefficients of the 2 variable GP can be obtained by 

(1) Computing the 1— D Galois transform coefficients of the rows of the 
matrix, which represents a mapping from GF(2^^) to GF(2’^), and replacing the rows with 
the resultmg coefficients which belong to say, GF(2^^), Li being the L C M of ki and n, 
followed by 


(2) Computing the 1— D GT coefficients of the resulting columns, which 

^ L 

now represent a mapping from GF(2 to GF(2 *) The resulting final coefficients belong 


to GF(2^), L being the L C M of k and Li 

2 


CONJUGACY RELATIONS [5] 


If at least one k does not divide n, the coefficients a satisfy the conjugacy 


relations given by 


(a )^= a 

^ —CD. ' 

■’2 

(a )^ = a 


~®j Q(mod 2''^-l) 


Q(mod 2 Q(n)od 2 -l) 


(2 2 2 1 a) 
(2 2 2 1 b) 


where 


j = ^,0,1,2, , 2 ’-2 and Q=2 


If both k divide n then fields to which the function values and the coefficients 


belong IS scime if 

(1) the function values belong to GF(2 ), and not to any of its sub fields 

(2) aJl the function values belong to GF(2 ) as well as to a subfield of it, 
naimely GF(2^^) ,and if both k divides n^ also 

Thus in cases (1) and (2), the GP coefficients exhibit trivial conjugacy relations and 
they belong to GF(2 ) and GF(2 ^) respectively 

(3) If all the function values belong to GF(2 ) as well as to a subfield of it, 
namely GF(2 M ,and if atleast one k does not divides n then conjugsicy relation exists 

I 1 

(given by 2 2 2 1 with Q=2 ^) and coefficients belong to GF(2^) where L=L C M of n^,k^ 
and k 


NUMBER OF FROBENIUS CYCLES [5] 

Frobemus cycles exist only if GF(S^) is a proper subfield of SF(S^) 


The number of frobemus cycles in the case of two variable GPs, is given by 



(2 2 3 1) 


nfrob=l+ E(j)(Dp/expQ(Dp +S (j>(D^)/expQ(D^) + 

DJMi D2IM2 

£ S (j>(Di)(KD2)/L C M(expQ(Dp expQ(D2)) 

Di I Ml D 2 1 M2 


It n 

where Mi=2 ^—1, M2=2 ^—1, Q=2 ,expQ(D])=e is the least positive integer such that 
Q*=l mod Dj and (|)(Di) is the Euler's phi function(i e number of integers smaller than Di, 
which are relatively prime to Di) 


Example Let n = l,ki = 3 ,k 2 = 3 
Then Ml = Ms = 7, Q = 2, 

The divisors of 7 are 1 and 7, 
exp 2 (l)=l and exp2(7)= 3, 

(j>(l)= 1 and(}){7)=6, 

Using (2 2 3 1) 

Thus nfrob = 1 + 1/1 + 6/3 + 1/1 + 6/3 + Ixl/L C M of (1,1) + Ix6/L C M of 
(1,3) + 6xl/L C M of (3,1) + 6x6/L G M of (3,3) 

nfrob=24 


conjugacy relation in this case would be 

2 

^ 2 j^(mod 7) 2 7) 

J^= — OD, 0, 1, 6, J^= -t3D, 0, 1, 6, 

Now we list the conjugacy classes 


(1) {(-tB,-®)} (2) {(-®,0)} 

(5) {(-m,l), (-®,2), (-®,4)} 

(7) {(0,1), (0,2), (0,4)} 

(9) {(1,-w), (2,-®), (4,-®)} 


(3) {(0,-^)} (4) {(0,0)} 

(6 ) {(— ®,3), (— ®,6), (— m,5)} 

(8) {(0,3), (0,6), (0,5)} 

(10) {(1,0), (2,0), (4,0)} 
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(11) {(1,1), (2.2), (4,4)} 

(13) {(1.3), (2,6), (4,5)} 

(15) {(1,5), (2,3), (4,6)} 

(17) {(3,— cd), (6,— oo), (5,— m)} 
(19) {(3,1), (6,2), (5,4)} 

(21) {(3,3), (6,6), (5,5)} 

(23) {(3,5), (6,3), (5,6)} 


(12) {(1,2), (2.4), (4,1)} 
(14) {(1,4), (2,1), (4.2)} 
(16) {(1,6), (2,5), (4,3)} 
(18) {(3,0), (6,0), (5,0)} 
(20) {(3,2), (6,4), (5,1)} 
(22) {(3,4), (6,1), (5,2)} 
(24) {(3,6), (6,5), (5,3)} 


We will be using these conjug£w:y constraints in the calculation of Galois coefficients 
of alphabets and numersds of size 8x8 



CHAPTERS 


TRANSPUTER AND OCCAM 

3 1 TRANSPUTER [2] 

3 1 1 INTRODUCTION 

The word Transputer may be mterpreted as a contrsiction of the words Trancetver 
and computer The interpretation suggests that the Transputer consists of a 
communication system and a computational element "Transputer" cam be used as a basic 
element in multiprocessor systems Transputer can be distinguished from emy other 
processor due to a number of build in features, like high processor speed, low chip count, 
and simplicity of the system design The chip, being very versatile, has received 
considerable attention and popularity among concurrent system designers 

A Transputer can be used in a single processor system or in networks to build high 
perform8ince concurrent systems A typical member of transputer product family is a single 
chip contaimng processor, memory and point— to— point commumcation Imks A network of 
Treinsputers can be easily constructed using these links As a microcomputer, the 
Transputer is unusual m its ability to commumcate with other Transputers A variety of 
different configurations can be built by hard— wiring Transputers together, with no separate 
switching and forwaxdmg network, limited only by the number of links provided on each 
Transputer The current Transputer systems have four to six links Four links are enough 
to allow enormous range of useful configurations 

The major advantages of Transputer based system eire 

Scalability It is much eaisier to enhance system by adding further transputer 
to it than would be the case with other microprocessors 



GompaUhilxty — the ease of repleicing one treuisputer model with another 
without major design chemges m a system This extends to mixing models of transputer 
within one system 

312 ARCHITECTURAL DETAILS 

Transputers axe often described as RISC processors The computational instructions 
follow RISC principles closely, and they attain the benefits claimed for RISC architecture 
However, they also have a small number of importemt non— RISC instructions concerned 
with scheduling and message passing 

The Transputer is unusual in its ability to execute many software processes at the 
same time A progreim can be run on a single Treinsputer, in which case the concurrency of 
the processes will be simulated by hardware with no softweire intervention Provided the 
commumcation between the subprocesses is not too complicated, the same program can 
also be distributed over several processors (transputers), in which case the component 
processes will be run in real concurrency Just as in a single processor, interprocess message 
passing and the necessary synchromzation (between directly connected Transputers) are 
achieved m hardware, and no operating system is needed 
The inbuilt features of the Transputer are 
Instruction processor 
Small amount of on chip memory 
Memory controller 

DMA control for four independent fast links 
A microcoded multitasking kernel 
And an elapsed— time clock 

The best known Transputers axe the T212, T414 and T800 The T212 is a 16— bit 
processor, the other two are 32— bit processors The T800 has a full IEEE floatmg— point 
processor on chip 


3121 


INSTRUCTION PROCESSOR 


The processor portion of a Transputer is a trauiitionaJ microprocessor The processor 
normally obtains its instructions and data from the interned 4K RAM Data emd 
instructions can also be obtained from the links The processor provides 32— bit addressing 
Memory is addressed byte— wise emd stored in 4— byte units Software sees no difference 
between on— chip and external memory except m speed 

The design of Transputer processor exploits the availability of fast on— chip memory 
by having only a small number of registers, the CPU contmns six registers which are used 
in the execution of a sequential process The small number of registers, together with the 
simplicity of the instruction set enables the processor to have relatively simple (and fast) 
data— paths and control logic 

3 12 2 INSTRUCTION SET 

As in RISC architecture, transputer instruction set is designed for simple and 
efficient compilation All the instructions have the seime format and chosen to gwe a 
compact representation of the operations which are most frequently occurring m programs 
The instruction size of the Transputer is 8 bits for most instructions There are prefix 
instructions which allow the operand to be extended to any length Measurements show 
that about 70% of the executed instructions are encoded in a smgle byte Short instructions 
improve the effectiveness of the instruction prefetch, which m turn improves processor 
performance There is an extra word of prefetch buffer, so the processor rarely has to wait 
for an instruction fetch before proceeding Since the buffer is short, there is little time 
penalty when a jump instruction causes the buffer contents to be discarded Also, the 
instruction set is independent of processor word length, allowing the same microcode to be 
used for Transputers with different word lengths 


3123 


MEMORY CONTROLLER 


The memory controller can drive external dynamic RAM with no additioned 
circmtry Together with the controller, the processor can address a lineeir address space of 
4— Gbytes The 32— bit wide memory interface uses multiplexed data and Eiddress lines and 
provides a data rate of up to 4 bytes every 100 nsmoseconds (40 Mbytes/sec) for a 30 MHz 
device The configurable memory controller provides all timing, control eind DRAM refresh 
signals for a wide variety of mixed memory systems 

312 4 PROCESS SCHEDULER 

The transputer provides efficient support for concurrency and communication It 
has a microcoded scheduler which enables any number of concurrent processes to be 
executed together, sharmg the processor time This removes the need for a software kernel 
The processor does not need to support the dyneimic allocation of the storage as the 
compiler is able to perform the allocation of space to the concurrent processes 

A process starts, perforins a number of actions, and then terminates Typically a 
process is a sequence of instructions A transputer can run 8ever^d processes in parallel 
(concurrently) Processes may be assigned either high or low priority At any time, a 
concurrent process may be 


active 

— 

being executed 


— 

on a list waiting to be executed 

ln^lctlve 

— 

ready to input 


— 

ready to output 


— 

waiting until a specified time 


The scheduler operates in such a way that inactive processes do not consume any 
processor time The active processes waiting to be executed are held on a list This is a 
hnked list of process workspaces, implemented using two registers, one of which points to 
the first process on the list, the other to the last 


The IMS T800 supports two levels of priority Priority 1 (low priority) processes are 
executed whenever there are no active priority 0 (high priority) processes High priority 
processes are expected to execute for a short time 

If one or more high priority processes are able to proceed, then one is selected Eind 
run until it has to wait for communication, a timer input, or until it completes processing 
If no process at high priority is able to proceed, but one or more processes at low priority 
are able to proceed, then one is selected Each process runs until it has completed its 
action, but is descheduled whilst waiting for commumcation from another process or 
Transputer, or for a time delay to complete In order for several processes to operate in 
parallel, a low priority process is only permitted to run for a maximum of two time slices 
before it is forcibly descheduled at the next descheduling point The time slice period is 
5120 cycles of the external 5 MHz clock, giving ticks approximately 1ms apart A process 
can only be descheduled on certain instructions, known as descheduling points As a result, 
an expression evaluation can be gueuranteed to execute without the process being time 
sliced part way through 

Whenever a process is imable to proceed, its instruction pointer is saved in the 
processor workspace and the next processor is taken from the list Process schedulmg 
pointers are updated by instructions which cause scheduling operations and should not be 
altered directly Actual process switch times are less them 1 /is, as little state needs to be 
saved and it is not necessary to save the evaluation stack on rescheduling 

The processor provides a number of special operations to support the processor 
model, including start process and end process When a main process executes a parallel 
construct, start process mstructions are used to create the necessary additional concurrent 
processes A start process instruction creates a new process by adding a new workspace to 
the end of the scheduhng list, enabling the new concurrent process to be executed together 
with the ones already being executed When a process is made active it is eilways added to 
the end of the list, and thus cannot pre-empt processes already on the same list 


The correct termination of a parallel construct is assured by the use of the end 
process instruction This uses a workspace location as a counter of the parallel construct 
components which have still to terminate The counter is initialized to the number of 
components before the processes are started Each component ends with an end process 
instruction which decrements and tests the counter For ail but the last component, the 
counter is non zero and the component is descheduled For the last component, the counter 
IS zero and the meun process continues 

312 5 COMMUNICATIONS 

Commumcation between the processes is achieved by means of channels The 
commumcation is point— to— point, synchronized and unbuffered As a result, a chaumel 
needs no process queue, no message queue and no message buffer A channel between two 
processes executmg on the same Transputer is implemented by a single word in men.ory, 
and a channel between processes executmg on different Transputers is implemented by 
point— to-point hnks The processor provides a number of operations to support message 
passmg, the most important being tnp^ii message and output message 

The input message and output message instructions use the address of the channel 
to determine whether the channel is internal or external This means that the same 
instruction sequence can be used for both for hard and soft channels, allowing a process to 
be written eind compiled without the knowledge of where its channels are connected The 
commumcation takes plswie when both the inputting and outputting processes are ready 
(Synchromcity) Consequently, the process which first becomes ready must wait until the 
second one is also reskdy A process performs soi input or output by loading the evaluation 
stack with a pointer to a messeige, the address of a channel, and a count of the number of 
bytes to be transferred, and then executing eui input message or an output message 


instruction 
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COMMUNICATION LINKS 


A link between two transputers is implemented by connecting a link interface on 
one Transputer to a link interface on the other Transputer by two one— directional signal 
wires, along which data is transmitted serially The two wires provide two channels, one in 
each direction This requires a simple protocol to multiplex data and control information 
Messages are transimtted as a sequence of bytes, eeich of which are to be acknowledged 
before the next is transmitted A byte of data is transmitted && a start bit followed by a 
one bit followed by eight bits of data followed by a stop bit An acknowledgement is 
transmitted as a staurt bit followed by a stop bit An acknowledgement indicates that a 
process was able to receive the data byte and that it is able to buffer another byte 

The protocol permits an acknowledgement to be generated as soon as the receiver 
has identified a data packet In this way the acknowledgement can be received by the 
transmitter before aU of the data packets have been tramsmitted emd the transmitter can 
transmit the next data paicket immediately The IMS T414 transputer does not implement 
this overlappmg and achieves a data rate of 0 8 Mbytes per second using a link to transfer 
in one direction However, by implementing sufficient overlapping and including sufficient 
buffering in the Imk haixdware, the IMS T800 more than doubles this data rate to 8 
Mbytes per second m one direction, and achieves 2 4 Mbytes per second when the link 
carries data in both directions 

312 7 TIMER 

The Transputers have two timer clocks which 'tick' periodicadly The timer provide 
accurate process timmg, allowing processes to be descheduled until a specific time In the 
IMS T8(X) Transputer, one timer is accessible to only high priority processes and is 
mcremented every microsecond, cycling completely in approximately 4295 milliseconds 
The other is accessible only to a low priority process and is incremented every 64 
microseconds, giving exactly 15625 ticks in one second It has a full period of 


approximately 76 hours 


312 8 THE FLOATING POINT PROCESSOR 

The IMS T800 has a full IEEE floating point processor on chip The FPU operates 
concurrently with CPU This means that it is possible to do address calculation in the CPU 
whilst the FPU performs the floating point calculation This can lead to significcuit 
performance improvements in real applications which access arrays heavily Performance 
depends on many things, mcludmg clock and memory speeds — for the 20 MHz T800 these 
figures are of the order of 10 RISC MIPS and 1 MFLOPS (these are not upper bounds) 

As can be noticed from above, all the links can be active at the same time as well as 
the processor Thus the Transputer can support nine truly concurrent activities (one link 
cam transfer data in both directions, of course, memory accesses have to be interleaved) On 
a T800, the floating point processor operate in parallel with the instruction processor, 
which gives a tenth level of concurrency at the hardware level (but both the processors are 
controlled by a single instruction stream) 

3 2 THE OCCAM 

Eniiites are not to be mxilttphed beyond necessity was the philosophy of the original 
implementation of Occam Occam is the native language for Transputers The design of 
Occam has been heavily influenced by the work on Gommxinicating Sequential Processes 
(CSP), which gives a mathematical frame work for specifying the behaviour of paredlel 
processes Occam is based on the CSP model of computation, but with features chosen to 
ensure efficiency of implementation In this model, an application is decomposed into a 
collection of communicating processes, and the processes communicate by passing 
messages 

Occam model is based on the idea of process The software building block is a 
process A system is designed in terms of an interconnected set of processes Each process 
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can be regarded as an independent unit of design Its internal design is hidden and is 
completely specified by the message it sends and receives Internally, each process can be 
designed as a set of communicating processes The system design is therefore hierarchicsdly 
structured Occam processes do not share any variables, nor semaphores Occam does not 
require (nor support) shared memory Messciges peiss from exactly one process to the other 
There are no multiple senders or receivers, no broadcasting, and no uncertainty about 
where a message ceime from or where it is going Messages are unbuffered, so sending and 
receiving a message involves momentary synchromzation between the two participating 
processes Messages Eire sent through static channels, as if through a circuit switched 
(rather than packet switched) network 

To gam the most benefit from the Transputer architecture, the whole system can be 
programmed in Occam This provides all the advEintages of a high level lernguage, the 
msonmum program efficiency and the ability to use the special features of the Transputers 
The Occam model of concurrency is applicable equally to processes runmng on separate 
processors and to processes runmng within a single processor Since the processor cem be 
controlled only by one instruction stream, it is evident that ike processes tn one processor 
cannot be truly concurrent However, the processes can be multiprogrammed, just as on a 
mEunframe, so that the effect of concurrency is reproduced (apart from speed) This 
'emulated concurrency' as distmct from 'real concurrency' is known as ’gratuitous 
concurrency’ Withm a Transputer, this multiprogramming is handled by hardware with no 
need for Einy operating system 

OccEim provides a framework for desigmng concurrent systems using TrEinsputers 
just m the same way that boolean algebra provides a framework for desigmng electronic 
systems from logic gates The system designer's task is eased because of the architectural 
relationship between Occam and Transputer A progrEim running in a Transputer is 
formally equivalent to an OccEim process, and so a network of Transputers can be described 
directly as ein OccEim program 


Occetm, when compiled for execution on a Transputer, is ideal for embedded 
multiprocessor systems Where it is required to exploit concurrency, but still to use 
standeird languages, Occam can be used as a harness to link modules written in selected 
languages Performance approeiching that of assembled machine code can be achieved, by 
both matching of an architecture to a specific language, eind the use of static memory 
allocation avoiding run— time memory range checking 

3 3 PROGRAMMING TRANSPUTERS [11] 

The foUowmg axe the points that concern the programmer of a concurrent system 
3 3 1 TOPOLOGY 

The pattern m which the processors axe connected together is known as topology or 
the configuration The idea of configuration depends on the eissumption that processors are 
coimected permanently, or at least for the life of a whole program, an assumption which is 
broadly true for current Transputer systems 

It IS the responsibility of the designer of the overaill system to decide how to 
configure the processors within it Desigmng a topology for a large system cem be chfiicult 
It can sometimes be gmded by an obvious mapping of a problem (or a solution) onto 
separate processors, but the designer is not edways so fortunate and most often analyse 
severed difficult possibdities using what help is obtainable from concurrency theory, graph 
theory, queinng theory etc There is a strong tendency to fall back on standard, straight 
forward and well understood topologies even through they may be fax from optimal 

3 3 2 PLACEMENT 

Describmg systems in terms of Occam processes edlows algonthimc issues to be 
separated from the question of what hardware is going to perform those activities This is a 
useful abstraction One would hke to be able to express an algorithm in the form of a 
progreim which is independent of hardware, so that it could be subsequently be performed 



using many different networks of processors Each implementation would need a 
specification of how many processors were needed, how they were to be connected, and 
which processes were to be installed on which processors, but the specification ought not to 
need any change in the program The specification is called placement 

In practice. Transputer programs axe not completely independent of their 
placement Unless it is carefully designed, a program will only run on one paiticular 
network of Transputers, or on a small number of similar networks To run on other 
networks, the program itself will have to be changed Sometimes the changes will be 
mimmal, but other cases may need extensive modifications Programmers therefore have to 
make a conscious effort to write programs which cam easily be run on different 
configurations 

3 3 3 NON DETERMINACY 

Sequential programmers axe used to the idea of a bug There are solid bugs and 
mtermittent bugs Intermittent bugs sire data dependent, a program run on the same 
mputs will work every time or it fails every time Concurrent programming has a third 
type of bug the bug that depends on the relative timing of concurrent processes These eure 
very often not repeatable, even if the program is rerim on the same data 

The problem arises because eill the processors axe allowed to run at their own speed 
There ts no attempt to constrain the processors into a lock step Thus the order of events 
c£in change from one test run to another It is obviously important at the design phase not 
to assume an exact ordering of the events One would also expect it to be impossibly hard 
to test and debug Occam software, yet programmers commonly do succeed The key to 
success IS to reduce the need for testmg to sin absolute mimmum, and to understand 
exactly what Occam defines as the effect of a program and what it leaves undefined Occam 
18 as precise as any other language about what individual processes do, but it cannot 
specify the relative timing of concurrent processes (otherwise the whole point of 



conciirrency would be lost) 

It IS the programmer's responsibility to ensure that when a program terminates it 
has completed the required function, regardless of the order in which things happened 
between steurtmg and terimnation Since Occam does not guarantee determinacy, it has 
been designed to express and handle non— determinacy very simply, flexibly, aind elegantly 

3 3 4 DEADLOCK 

A classic problem m concurrent systems is deadlock This is an affliction whereby 
one part of a program is waiting for another to do something, the other is waiting for the 
first to do somethmg else, smd since both are waiting, neither can do what other expects 
There may, of course, be a set of processes involved, rather than a peur 

All Transputer deadlocks are essentially the same in that they involve a closed 
cham of processes, each trying to commumcate with another, but with no pair of them 
willing to participating in any one commumcation There is an enormous veiriety of ways 
which may lead to deadlock, and that makes it hard to avoid It is also difficult to analyse 
zind hard to cure after it is found to occur in a program Formal methods used are writing 
only simple code, supported by intuition and beick— of— envelope sketches 

3 3 5 GENERAL ALGORITHM DESIGN CONSIDERATIONS 

Most of the concurrent ailgorithms are only loosely related to their sequential 
eqmvalents The conversion of a sequential algorithm into concurrent algorithm, often 
called pafeillehzation is too ill defined and difficult to be automated and is often attempted 
by hand The following points should be given due consideration while developing the 
concurrent edgorithms 
Granularity 

The term granularity refers to the size of the task that eure distributed as concurrent 
processes In a matrix multiplication, all the elements of the result can, m principle, be 


evaluated concurrently because none of the calculations depends on the result of any other 
That would be fine gram concurrency A more coarse greun division of labour could 
evaluate one quarter of the result, working sequentieilly on one processor, while eveduatmg 
the other three quarters on three other processors 

Intuitively, one would expect fine-grained concurrency to deliver results faster but 
to use more processors With the Transputers, a finer— grain division of work requires more 
information to be passed between processors Reducing the gram size below some optimum 
value for a particular problem can mcur messeige passing costs which grossly outweigh the 
expected benefits 

Granularity is a particular problem m the conversion of existing sequential code into 
concurrent methods It is relatively easy to recognize fine— gram potentieJ paxallehsm, 
e g several assignments whose order is immaterial, or a simple loop It is much harder to 
identify the larger muts of a program which might safely be run concurrently 
Performance measure and effiaency 

One measure of the quality of a concurrent system is its efficiency in using the 
processors If a smgle processor solution takes n seconds, it is desirable to achieve a solution 
with m processors in n/m seconds Complete efficiency is never attainable, but the user 
tries to get near to it Efficiency must fall off as more processors are added 
Balancing communication and processing 

Some tunes, the concurrent algorithm may show very poor efficiency the reasons 
could be 

The computation has not been divided equally — one of the processor still has to do 
fax more than 1/n th of the work 

The processes aue independent, they have to share or commumcate information via 
hnks, and some of them spend a lot of time waiting for others, during which time they 
cannot contmue with the computation 

Even if time is not wasted m waiting for messages, there may lot of time spent in 



passing the messages 

These three possibihties axe quite distinct The first one has to be solved by load 
balancmg — attempting to equeilize the computation load on each processor The second 
can be described as 'spunous concurrenq/' It can occur by accident, and it is not easy to 
predict, 2inaJy8e or detect Even when it is known that it is happening, it is not easily 
cured The third reason for inefficiency is a difficult optimization problem between 
computation and commumcation times 
Software development 

Transputer software is mostly developed under the ’Transputer development 
environment' (TDS) supplied by INMOS TDS emd related systems provide an integrated 
environment for editing, compiling and runiung the programs They are centered on the 
folding editor which embodies an elegant and general way of representmg and handhng 
large eimount of Occam text within a 8 m^lll screen They lack some of the support tools 
that are common m other systems (e g ,in UNIX, diff, grep, multitasking, batch files, 
abases, email, logm files ) 

The TDS IS so much an mtegrated environment that its files are not easily heindled 
m other systems, not even in systems such as MS-DOS which is acting as the host or file 
server for TDS This makes it hard for the utility software on the host system to provide 
the facilities that axe missing from the TDS 

Programmers would very much like to have further help in the peculiar difficulties 
of concurrent program development The strongest demand at the moment, and the one 
which seems nearest to fulfillment, is for software tools for profiling, momtoring, and 
debugging 

Needs rem aining unsatisfied 

There are several areas in which user's experience has established a requirement 
that no hardwcire or software has yet fulfilled These users need a more comprehensive (and 



more farmli 2 ir) softwaure development environment, with tools for designing and 
constructing concurrent programs as well as more normal services for sequential 
programming, module management, etc Extensive and usable monitoring and debugging 
facilities should be available, and more accessible formed methods, with software tools to 
support their use, would be welcome The problem of file capacity, transfer rate and 
back-up have still to be solved 

On the hardware side, it is widely felt that the Transputer should use its on— chip 
memory as ceiche store, and that it should provide at least some support for memory 
meinagement and protection More links per Transputer would be welcome, as would 
automatic forwarding of messages between processors which are not directly connected 

Nevertheless the processing speed, the eeise of multiprocessor interfacing and the 
availability of a naive programnung language with attractive features make the Transputer 
a very good candidate for multiprocessing 

3 4 SETUP OF A TRANSPUTER NETWORK SYSTEM [7] 

The major components of a processor network are 
Host computer(mostly a PC) 

Interfewie umt 

processmg element array (Tr^lnsduce^8) 

Interconnection networks 
Host computer 

The host computer is intended to provide system monitoring, data storeige and 
management It generates globed control codes and object codes of processor elements It 

can be a microcomputer, workstation, or a mam frame We have used PC— AT for this 

1 

purpose Processor elements can be accessed by a procedure ceill on the host, or through ein 
interactive progreunmable command interpreter 


zo 


Interface \jmt 

Interface vmit is Etn interface between the host and PE array Interface unit, 
connected to the host via host or host bus, or DMA has the function of down loading, up 
loading, buffering array data and handling interrupts It supports high bandwidth 
communication (accompanying high-speed processing) between the array emd the host For 
bedancing between low beindwidth of the system I/O and high bandwidth of the processor 
array, sufficient buffering is provided 
Processing element array 

A PE array consists of a number of processmg elements with locEil memory PE's 
effectively utilize its data storage thereby saving commumcation time 
Interconnection network 

Inter connectti on within PE array Eire provided by large switching networks which 
provide flexibihty of interconnections and high speed communication between the PE's 

3 4 1 HARDWARE SETUP OF THE TRANSPUTER NETWORK SYSTEM [1,2] 

A mne-node transputer network has been set up in the image processing lab at IIT 
kampur The setup contains two IMS BOOS eveJuation boards, each having four trEinputers 
and am IMS B004 transputer system development (TDS) board having one transputer Of 
these rune transputers, five are TSOO's and four Eire T414'8 The host is an PC— AT (80386) 
One of the transputers functions as the interface unit, known as root processor, with 4 
Mbytes of on board RAM The root is connected to the host via a link Eidapter (IMS 0012) 
Most of the time root transputer will be used for executing the I/O routines reqmred for 
the host file server PEs ar? individual tranputers each with 256 Kbytes of RAM on board 
Transputer link are connected by a progrEimmable switch known as Link switch 

Link adapter 

It connects the host sind the root transputer It provides for full duplex 
transputer link commumcation by converting bi-directional serial link data into parallel 
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data stream 
Host 

The host serves as a file server as memory management and file system are 
not supported by transputers This facility of host is accessed by a program called server 
The server reads a DOS file to determme the network configuration, the programs to be 
loaded and the boot order The host loeids the network via the root tremsputer by sending 
loeiding information to it, which m turn boots and loads the transputers coimected to it 
which eigain passes the and loadmg information forward and so on until the whole network 
has been booted smd loaded 

Booting of the transputer network 

A communication protocol exists between the host and the trsinsputer network to 
direct the code to the desired place in each transputer The bootstrap code for each 
transputer is sent first After all transputers are booted, the code of each of the procedures 
sillocated to processors is exported to the network preceded by necessary routing and 
loadmg information Following this, the code which ceiIIs the procedures is sent to each 
processor 

Link switch 

Transputer network is interconnected using Link switch It is a programmable link 
switch designed to provide a full crossbar switch between 32 hnk inputs and 32 link 
outputs It uses the capabilities of VLSI to offer simple, easy to use eind cheap 
intercormections for computer systems It introduces on the average of 1 75 bit time delay 
on the signal The switch is programmed via a separate link called the configuration link 
In the setup, LINKO of the root transputer is connected to the host Eind LINKl is used as 
the configuration link for the link switch, so only LINK2 and LINK3 on the root transputer 
axe free Also, since the link switch supports only 32 connections, two links of the last (9th) 
transputer cannot be used at present 



Hardware Setup 
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3 4 2 CODE DEVELOPMENT [1] 

Occam IS used for the program development vmder the TDS environment The scune 
program can be developed on a smgle node or on a multi node transputer network On 
single node program would be developed as a parallel prograun of n processes It would be 
using soft channels for communication between the parallel processes On multi node 
network some changes have to be meide 

1 The individual processes are to be define as procedures whose parameters should 
be only the heud channels and should be compiled separately 

2 The separately compiled procedures are 'PLACED' on individual transputers and 
the progreun is then compiled to generate a network code file 

3 For runnmg the program, network link switch should be configured 

The single/multi node program can be tested from the TDS environment or it could 
be a stemdalone program bootable by the external host by using the alien file server 
routines available in the TDS environment 



CHAPTER 4 


ALGORITHMS FOR CALCULATING COEFFICIENTS 

In this chapter, we are going to study and implement different algorithms for 
computing Galois Transform coefficients All the algorithms have been implemented on 
Transputer facihty available in IP— LAB Also, time compsirison of these algorithms hcis 
been done for same input function The ailgorithms Eire 

(1) USING SYSTOLIC ARRAY 

(2) POLYNOMIAL COMPUTATION USING HORNER'S RULE 

(3) FFT (USING LINEAR ARRANGEMENT OF PROCESSORS) 

(4) FFT (USING MESH ARRANGEMENT OF PROCESSORS) 

4 1 USING SYSTOLIC ARRAY [7, 10] 

According to H T Kung, "a systolic system is a network of processors which 
rhythmically compute sind pass the data through the system" Once a data item is brought 
out from the memory, it cein be used effectively at each cell it passes while being "pumped" 
from cell to cell along the eurray 

The computational tasks can be cleissified mto two families 

1 Compute— bound computation 

2 I/O bound computation 

In a computation, if the total number of operations is larger than total number of 
I/O operations, then the computation is compute— bound, otherwise it is I/O bound 
Speeding up the I/O bound computation requires an increase m memory bandwidth, which 
is difficult in current technologies Speeding up the compute— bound computation, however 
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may be accomplished by usmg Systolic arrays 
The mam features of systolic array are 
Synchromcity 

Modularity and regularity The eirray consists of modular processing elements 
with homogeneous interconnections Also, the network may be extended indefimtely 

Spatial and temporal locality The array manifests a locally communicative 
interconnection structure, i e spatial locsility There is at least one unit delay so that signal 
transactions from one node to next node can be completed i e temporal locality 

Pipelmability The array exhibits a linear rate pipelinability i e it should 
achieve 0(M) speedup, m terms of the processing rate, where M is the number of 
processing elements 

The major factors for adopting Systolic arrays for special architectures are 

1 Simple and regulm design 

2 Concurrency and communication 

3 Balancing computation with I/O 

4 1 1 DESIGN METHODOLOGY FOR SYSTOLIC ARRAYS 

Due to feist progress in VLSI technology, algorithm oriented array architectures 
appear to be effective, feeisible and economic Therefore a systematic methodology of 
mapping computations onto systolic array heis been developed 

Parallel algorithm expression may be derived by two approeiches 

1 Vectonzation of sequential edgorithm expressions 

2 Direct parallel algorithm expressions, such as snapshots, recursive 
equations, parallel codes, single assignment code, dependence graphs and so on 

4 111 SYSTOLIC ARRAY DESIGN FOR MATRIX MULTIPLICATION [7] 

As conceptually, we have to do matrix multiplication to find Galois coefficients we 
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will be designing systolic axray for that If A and B axe NxN matrices then their product is 
given by C=AB 

The elements of C axe G,,= S a b 

k=0 ik kj 

This equation implies that aill the multiplications involved can be carried out at the 
same time since they do not have any dependence between ezich other To get meocimum 
parallelism, we need to propagate input data to multipliers, two input links for each 
multiplier That means at least 2N3 communication links axe necessary As we only have 
eight processors and the size of the array can go up to 256*256 we will have to implement 
it in euiother way What we have done is to divide the Galois matrix 2 2 1 into eight equal 
parts (row wise), and rows of input axray are sent to each processor one by one In each 
processor vector multiplication takes pletce Processors are arranged in a linear memner as 
shown in the diagram below 
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In each of the eight processors we place two rows from the above sixteen rows of 
Galois matrix After this one by one eeich row of input array is sent into the network In 
eeich processor two vector multiplication takes place and input row and output is 
transferred to the next processor 

4 1 2 ALGORITHM FOR SYSTOLIC ARRAY IMPLEMENTATION 

Steps in the implementation of sequential algorithm are 

1 Read the input data in a array a[i][j] 

2 

a Start the computation for i=0 i e send i th row (here first row) of 
input Multiply the row with Galois matrix given in section 2 1 
b Receive the output in array b 

k, 

3 Repeat step 2 for all values of i=l,2 ,2 1 and put the values in b 

4 Treinsform the array b[i][j] 

5 Repeat the steps 2,3 eind 4 with b as input array and c as output array 

In Computation of step 2, as each row of Gedois matrix can be independently 
multiplied to input array, therefore all elements of a row of matrix b can be ceilculated 
independently The algorithm for amy processor in a transputer network can be given as 

1 Set up 2^/8 rows of Galois matrix in the processor 

2 Receive the row of input array and output elements from the previous 
processor Multiply this row with part of Galois matrix present in the processor and 
calculate 2^/8 elements 

3 Send the received input row and total output (output from previous processor 
plus output computed in this processor) to next processor Go to step 2 again till the whole 
input airray has passed through it 

Once the whole array has passed through the network take the transpose eind steps 
1,2 and 3 are repeated once again 
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4 2 HORNER'S RULE IMPLEMENTATION [5] 

The Horner's rule is a standard method for polynomial computation, irrespective of 
the type of polynomial under consideration To describe the method of computation, let us 
consider the GP 

a x2^-2+ + a x2^-ln+ + a^x 

where ( = 2^2 
This may also be written as 

f(x)=(( (((a^x+a^)x+a^)x+a^)x+ +a^ px+a^)x+a_^ (4 2 1) 

The above equation suggests procedure for polynomial computation and is 
illustrated in figure given below 



multiplications and additions 

2 

To get maximum parallelism, we need N closed loop multiplier /adder circints of 

above type As we only have eight processors and the size of the array can go up to 256x256 

2 

we divide the work in 8 transputers only i e each Transputer will be doing the work of N /8 
such circuits 

As we are dealing with two variable polynomials therefore we will have to apply the 

above rule twice As we know 

f(xi,X 2 )=ES a x'J'x'J2 
Jij2 jjj2 1 2 


(4 2 2) 



(l)Keeping ji constant (same row) we first calculate S a x'J^ for eeich VEilue 

j2 JjJj 2 

of (using Horner’s rule) Let us define output obtsuned for some value of constemt jj and 

for X 2 =a^^ as b, 1 Thus we obtain array b 
h -’2 


(2)Keepmg j 2 constant (same column) we calculate Sb x'J* for each value 

of xi (using Horner’s rule) Let us define output obtained for some value of constant j 2 and 
for xi=or'*^ as which is the reqmred result 


4 2 1 ALGORITHM FOR POLYNOMIAL COMPUTATION 

Steps in the implementation of sequential algorithm are 

1 Read the input data in a array a[i][j] 

2 

a Start the computation for i=0 i e put the values of ith (here first) 
row zis a’s in equation 4 2 1 

b Now take x=a~“ =0 in equation 4 2 1 and compute the value and 

put m b[0][0] 

c Repeat step 2 b for all values of x=aJ"^ (j=l> >2 ^—1) and put in 

b[0]D] 

3 Repeat step 2 for all values of i=l,2 ,2 ^-1 and put the values in b[i][j] 

4 Transform the array b[i][j] 

5 Repeat the steps 2,3 and 4 with b as input airray and c as output array 
Computation of step 3 does not depend on output of step 2, therefore all rows of 

array b can be calculated independently Hence we will be calculating different rows 
concurrently on different processors The algorithm for any processor in a transputer 
network can be given as 

1 Receive the input row and compute Eq 4 2 1 for different values of x 

2 Send the computed row back and receive next row 
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The root processor reads input values and sends them to the first processor The 
first processor sends them to all processors, which compute and send back computed values 
to the first processor, which m turn sends them to the root processor Here transpose is 
taken and the above steps eure repeated once again We can show the processor arrangement 
through a diagram given below 

proc 7 

II 

proc 3 proc 5 

II II 

host • root » proc 1 • proc 2 

^ ^ il il 

proc 4 proc 6 

II 

proc 8 

4 3 FAST FOURIER TRANSFORM 

As we know Galois coefficients are basically Fourier coefficients of function, therefore 

we can use the FFT algorithm In the case of finite fields the size of DFT matrix is 2^“ 1 x 

k 

2'^1, therefore we can not use Butterfly algorithm( which can only be used for size 2 ) 
Hence we are going to use cooley tukey algorithm 

4 3 1 COOLEY TUKEY ALGORITHM [6] 

This FFT algorithm is based on the strategy of changing a one dimensional 
fourier transform into a two dimensional fourier transform which is easier to compute The 
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fourier transform of a vector v 

V = £ w V here w is the primitive element 
k 1=0 1 

as it IS written reqmres on the order of n^ multiplications and eudditions If n is not a 
prime number then this 1 — D FFT can be converted to a 2 — D FFT This chemges the 
computation to a form that is much more efficient To understaind the cooley tukey 
algorithm suppose that 


n= ni na 

1= ii + ni i2 

k= na ki + ka 

where ii and ki= 0,1, , nj— 1 

and la and ka= 0,1, , na— 1 

Putting the above suppositions in the formula we get 

V. , V. , 

n n ni k 

Expeind the product eind because w =1 therefore w ^ ^ ^ ^ can be dropped Also we 
define the convention that 

»nd 

In this way input and output data vectors are mapped into two dimensional arrays 
In terms of two dimensional variables the formula becomes 

Vk k = (w''2)^l^l (w“l)^2^2 y ] (4 311) 

12 1 = 0 ^ ^ *• 1 = 0 ^ ' i 2^ 

i jIi 


1 e we have to first calculate w ^ ^ (w ^ ^ v. 1 for eaich i and k ( therefore 

1=0 ^^12 1 2 ^ 

2 

2 

(n n + n n ) multiplications required) Let the result be u^^i^ Later for each you have 

to find Ui 1 fn n multiplications required) 

1 ^ = 0 ^ ^ 12^2 


Therefore number of multiplications ^ n(ni + n2) +n 
Number of additions ^ n(ni + n2 “ 2 ) 
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Example 

The cooley tukey FFt caxi be visualized as mapping a 1— D array into a 2— D array as 

shown below for n = 21 The computation consists of 3 point DFT on each column, 

1 ^ 

followed by element by element multiplication throughout the new array by w ^ followed 
by an 7 point DFT on each row Observe that the components of the transform V are found 
airranged differently in the array than the components of the signal v this is known as 
address shuffling 

V V 




4 3 2 Algorithm for FFT(m 2— D case) 

Steps in the implementation of the sequential algorithm are 

1 Reed the input data in a array a[i][j] 

2 Start the computation for i=0 i e teike the i th(here first) row Take factors 
of n=nin 2 and implement the equation 4 3 11 and put the results in b[0][j] (first row) 

3 Repeat step 2 for all values of i=l,2 ,2 ^—1 and put the values in b[i][j] 

4 Transform the array b[i][j] 

5 Repeat the steps 2,3 and 4 with b as input array and c as output eirray 
Computation of step 3 does not depend on output of step 2, therefore all rows of 

array b can be calculated independently Hence we will be calculating different rows 
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concurrently on different processors The algorithm for any (kth) processor in a treinsputer 
network czm be given as 

1 Receive kth row and Compute Elq 4 3 1 1 

2 Send the computed row back and receive next row(k-f-8) Repeat the step 2 
We have implemented the above algorithm with two different sirremgement of 

processors 

In linear arrangement of processors each processor computes and sends back 
computed values to its preceding processor and thus computed values reach root processor 
via the same path m opposite direction Here transpose is taken and the data is agam sent 
in the same way The eirrangement can be shown in the diagreim given below 
host 

root 

prod 


proc8 


proc2 ^ — > proc3 ^ — >p roc4 ^ — >p roc 5j — >p roc 6j — > proc7 


In the mesh arrangement the root processor reads input values and sends them to 
the first processor The first processor sends them to all processors, which compute and 
send back computed values to the first processor, which in turn sends them to the root 
processor Here transpose is taken and the above steps axe repeated once again The 
arrangement of processors is same as shown for homer’s rule 



CHAPTER 5 


IDENTIFICATION OF ENGLISH ALPHABET 
AND ARABIC NUMERALS 


5 1 INTRODUCTION 

In previous chapters, we have developed the Algorithms for finding 2— D Galois 
transform (GT) coefficients from images and vica— versa We are going to study the use of 
these coefficients and real images in the process of identification Up till now, the methods 
which have been developed for identification use the segmentation techmques like line and 
edge detection for image enhancement and then pattern recogmtion techniques for image 
recognition The method that we are going to use is based on matching of Galois 
coefficients of image with Gadois coefficients stored in the computer memory Although this 
type of method would require a huge memory, we are not going to compare this method 
with existing ones or test its viability We eure only trying to devise some method which is 
theoretically possible As some criterion is necessary to evaluate its performance, we Me 
comparing it with method which is based on matching of images (termed as reed speice 
matching in remaining peurt of thesis) 

As the whole set of imeiges would be too large a set to deal with, we are restricting 
our attention on 8x8 binary pixel representation of English alphabet and Arabic numerals 
The GT coefficients would be taking values from GF(2 ) In all, there are sixty two 
different shapes (twenty six for small silphabet, twenty six for bigger alphabet and ten for 
arable numerals) The images of these English alphabet and Arabic numerals (jointly 
termed as alphabet in remaamng part of thesis) are given in Appendix A These images are 
of the IBM font of alphabet used in PC Some examples of images sire 
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a 3 K 


0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 

0 

0 

1 

0 

0 

0 

0 

1 0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

1 

00 

0 

1 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

1 

0 

0 

1 

0 

00 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

1 

1 

1 

1 

1 

0 

0 

1 

1 

1 

0 

0 

00 

0 

0 

1 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 

0 

0 

1 

0 

1 

0 

0 

1 

0 

00 

0 

1 

0 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 

00 

0 

1 

1 

1 

1 

1 

1 

1 

0 

0 

1 

1 

1 

1 

1 

0 

0 

1 

0 

0 

0 

0 

10 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

00 

0 


The coefficients of different alphabet axe given in Appendix B Corresponding 
coefficients of above zilphabet axe 


a 

3 

K 

01357246 

01474667 

10526374 

00000000 

01474667 

11643725 

43501724 

70206261 

73534524 

75623071 

60334014 

65223577 

67213260 

24704746 

57575212 

62046315 

40051577 

42236365 

74150527 

56047466 

34455331 

46334105 

37667047 

26321632 


Here 0 stands for 0, 1 stands for 1, 2 steinds for a, 3 stands for a^, 4 stemds for 
and 7 stands for a® where a is the primitive element of GF(23) 

As the size of possible shapes is limited to sixty two and in we are only taking 

binary representation of images therefore we should require atleast six compeinsons 

(because 2 =64 > 62) to identify in real space (a tree like structure) Similarly for 

coefficient space as the the number of levels is eight therefore we should require atleast two 

2 

comparisons (because 8 =64 > 62) for identification Thus we see that simple mathematics 
tells us that we axe at advantage (as regards to number of comparisons or time reqmred for 
identification) in coefficient domain But we are at disadvantage in coefficient domain 
because of the time required for calculation of coefficients (assuming data is being 
transmitted and stored in the form of real space ) Let's see whether this treide off in time 
will be beneficial (i e less time is reqmred) for identification in real sample domain or in 
coefficient domain We have developed three methods for identification These methods 
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have been developed somewhat m an ad— hoc manner only for the purpose of comparing the 
performances of identification in sample domain and in coefficient domain The mam 
motivation behind methods developed has been to try to mimmize the average time 
reqmred for identification of any alphabet We have tried to keep in mind the probability 
of occurrence of different alphabet in devising the methods The method should be such 
that in the calculation of average time required for identification, alphabet like t,e, a should 
have less comparisons required for identification than x,q,z But I do not contend that the 
methods developed are of least average time in identification of above sixty two shapes For 
developing those methods one would require a deep study of operations research Also our 
basic aim in this thesis is different than that In later part of this chapter we will fleetingly 
gloss over the problem that what should be the ideal structure of alphabet (i e their shapes) 
to have qmck identification in both domain 
Of the three methods 

Two methods aire based on the differences in coefficient domain and 
One method is based on the differences in real domain 
We axe going to study the Algebra behind the three methods Also we will study the 
three (methods) adgonthms in detail These algorithms have been developed on transputer 
facility available in I P lab 

5 2 ALGEBRAIC PROPERTIES OF GALOIS COEFFICIENTS m 8*8 MATRICES 

As we study the different coefficients, we observe some Properties of Galois 
coefficients We have observed these coefficients satisfy the conjugacy constraints as given 
in example 2 2 1 Conjugcicy classes have been enumerated here again In esich class eeich 
element is square of preceding element 

( 1 ) ( 2 ) {(^, 0 )} ( 3 ) {( 0 ,^)} ( 4 ) {( 0 , 0 )} 

(5) {(— 00 , 1 ), (—00,2), (—00,4)} (6) {(— 00 , 3 ), (— tB,6), (—00,5)} 

( 7 ) {( 0 , 1 ), ( 0 , 2 ), ( 0 , 4 )} (8 ) {( 0 , 3 ), ( 0 , 6 ), ( 0 , 5 )} 
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( 9 ) {( 1 ,-^), (2,-nj), ( 4 ,- 00 )) 
(11) {(1,1), (2,2), (4,4)} 

(13) {(1.3), (2,6), (4,5)} 

(15) {(1,5), (2,3), (4,6)} 

(17) {(3.-05), (6.— (c), (5 ,— (d)} 
(19) {(3,1), (6,2), (5,4)} 

(21) {(3,3), (6,6), (6,6)} 

(23) {(3.5), (6,3), (6,6)} 


(10) {(1,0), (2,0), (4,0)} 
(12) {(1,2), (2,4), (4,1)} 
(14) {(1,4), (2,1), (4,2)} 
(16) {(1,6), (2,6), (4,3)} 
(18) {(3,0), (6,0), (5,0)} 
(20) {(3,2), (6,4), (5,1)} 
(22) {(3,4), (6,1), (5,2)} 
(24) {(3,6), (6,6), (6,3)} 


The four coefficients given by 

(1) {(- 00 , -xd)} (2) {(-®,0)} (3) {(O.-od)} (4) {(0,0)} 

2 ire either 0 or 1 


We can also find relationships between weighted sum of coefficients 
For example teike the sum of first row 


a, “f'S- ■f'S- 

-oo”oo ”oo0 “ool ~cio2 ”oo3 "oo4 ”oo5 “006 

Because a = S x'*f(0,x) 

Therefore L H S = E S x'*f(0,x) + f(0,0) 

j 

first term is for j=0,l,2,3,4,5,6 and second term is for j= —00 
LHS = Sf(0,x) SxVf(O.O) 

’f j 

Because S x is zero for all x except for x=0 and 1 

J 

1 e for x= a,Q‘^,cx^,a^ ,a^ ,ofi S x is 0 and for x=0,l it is 1 

J 

Therefore L H S = (f(0,l)+f(0,0))+f(0,0)= f(0,l) 

Talte the sum of second row 


a +a +a -fa -fa -fa -fa -fa 
o-oo 00 01 02 03 04 05 06 

Because a = £ E x^ f(x ,x ) 

oj 2^12^ 
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(9) (2,-oo), (4, -to)} 

(11) {(1,1). (2,2), (4,4)} 

(13) {(1,3), (2,6), (4,5)} 

(15) {(1,5), (2,3), (4,6)} 

(17) {(3,-m), (6 ,-to), (5,-0))} 
(19) {(3,1), (6,2), (5,4)} 

(21) {(3,3), (6,6), (5,5)} 

(23) {(3,5), (6,3), (5,6)} 


(10) {(1,0), (2,0), (4,0)} 
(12) {(1,2), (2,4), (4,1)} 
(14) {(1,4), (2,1), (4,2)} 
(16) {(1,6), (2,5), (4,3)} 
(18) {(3,0), (6,0), (5,0)} 
(20) {(3,2), (6,4), (5,1)} 
(22) {(3,4), (6,1), (5,2)} 
(24) {(3,6), (6,5), (5,3)} 


The foiar coefficients given by 

(1) {(-m,-m)} (2) {(-a,0)} (3) {(0 ,-to)} (4) {(0,0)} 


are either 0 or 1 


We can also find relationships between weighted sum of coefficients 


For example take the sum of first row 


a -|“a ’(“O' ‘i'cL “ha -hs- “h^ 

“co“oo “ooO “ool “<»2 “Co3 "C»4 “0I>5 “oo6 

Because a = S x'*f(0,x) 

-®3 

Therefore L H S = S S x f(0,x) + f(0,0) 

1 ’t 

first term is for j=0, 1,2, 3,4,5, 6 and second term is for j= —to 
L H S = S f(0,x) Sx^ + f(0,0) 

J 

Because S x is zero for ail x except for x=0 and 1 
} 

1 e for x=a,a'^,c^,a^,a^,ofi S x is 0 and for x=0,l it is 1 

j 

Therefore L H S = (f(0,l)+f(0,0))+f(0,0)=: f(0,l) 


Take the sum of second row 


a "ba "ba "ba -ba "ba "ba "ba 
O-oo 00 01 02 03 04 05 06 

Because a = ES x'' f(x ,x ) 

Oj 2 M 2' 
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Therefore L H S= S S f(x ,x ) S x'* + S f(x ,0) 


1 2 j 2 Xj 1 

Because E x'* is zero for aJl x except x =0 and 1 therefore 

j 2 2 2 

L H S = E f(x ,1) 1 e sum of second column in function domain 

Let's take the weighted sum of third row What should be the weights'^ How to 
decide'^ Instead of S x^ we should have a polynomial such that it is zero for all x except 

j 2 2 

X 2 =cr Such a polynomial is 

2 3 4 .6 

g(x) = (x— l)(x— O' )(x— a )(x— a )(x— a^)(x— a ) 

6 52433425 6 

g(x) = X -fax +Q' X +0” X +0 X d-Qf x+or 

2 3 4 5 6 6 

g(x)=0 for X =l,a ,o ,cr ,0 ,0 and g(x)=Q' for 0 and a 

Using the above property we observe that if we multiply the coefficients by 

above weightages then weighted sum of third row is 

6 6 5 4 3 2 

aa +« a d-aa -for a d-Q" a d-Of a d-ua -fa 
1-00 10 n 12 13 14 15 16 


Because a = EE x x^^f(x ,x ) 
ijj 12 ^12^ 

6 

Therefore L H S = E E x f(x ,x ) g(x ) -f o E x f(x ,0) 

first term is for j =0,1, 2, 3, 4, 5, 6 and second term is for j =— cd 
2 2 

first term is zero for all x except for x = 0 and or 

2 2 
6 6 6 

Therefore L H S = E x f(x ,0) a -fS x f(x ,a) a -f o E x f(x ,0) 

Xj 1 1 Xj 1 1 Xj 1 1 

Therefore L H S = E x f(x ,a) a 

Xi 1^ 1 

6 6 5 4 3 2 6 v r/ N 

a a -fa a -fa a -fa a -fa a -fa a -fa a -fa =a L x ifx ,a) 

1-00 10 11 12 13 14 15 16 Xj 1 I 

Multiplying both sides by a we get 

6 5 4 3 2 

a -fa -fa a -fa a -fa a -fa a d-a a -fa a = 

1-00 10 11 12 13 14 15 16 


Sx f(x a 


Therefore weighted sum of third row is equivalent to third galois coefficien 
a) 

Similcirly we would get 


of vector f(x^,a) 
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5 3 6 4 2 2 2 

a +a +aa +Q!a +aa +Q:a +a a +a a = Lxf(x,a) 

2- 00 20 21 22 23 24 25 26 1 1 

4 5 2 6 3 3 3 

a -fa -fa a -fa a -fa a -fa a -fa a -fa a = S x f(x ,a ) 

3- 00 30 31 32 33 34 35 36 1 I 

36 2 5 4 4 4 

a -fa -fa a -fa a -fa a -fa a -fa a -fa a = L x f(x ,a ) 

4- 00 40 41 42 43 44 45 46 x^ 1 1 

2 4 6 3 5 r, 5^. 5. 

a -fa -fa a -fa a -fa a -fa a -fa a -fa a =Lxf(x,o) 

5- 00 50 51 52 53 54 55 56 ^ * 

2 3 4 5 6 v> 6,, 6, 

a -fa -fa a -fa a -fa a -fa a -fa a -fa a =Lxi(x,a) 

6- 00 60 61 62 63 64 65 66 ’'j ^ ^ 

Similar relations can be obtained for weighted sum of columns 


a -fa -fa -fa -fa -fa -fa -fa — f(l(0) 

“00“00 O“00 l^OO 2’'00 3”C» 4-00 5"0O 6 “00 

a -fa -fa -fa -fa -fa -fa -fa = E f(l,x ) 

-ooO 00 10 20 30 40 50 60 x^ 2 

6 5 4 3 2 r. r/ ^ 

a -fa -fa a -fa a -fa a -fa a -fa a -faa = 2j x f(a,x ) 

-ool 01 11 21 31 41 51 61 x^ 2 2 

5 3 6 4 2 „ 2 2 

a -fa -faa f-a a -faa -faa -fa a -fa a = b x i(o ,x ) 

-oo2 02 12 22 32 42 52 62 x^ 2 2 

4 5 2 6 3 „ 3^ 3 , 

a -fa -faa -faa -faa -faa -fa a -fa a = L x f( a ,x ) 

-oo3 03 13 23 33 43 53 63 x^ 2 2 

3 6 2 5 4 4 4 

a -fa -faa -faa -faa -faa -faa -fa a = b x f(a ,x j 

-oo4 04 14 24 34 44 54 64 x^ 2 2 

246 3 5 „ 5„ 5 . 

a -fa -faa -faa -faa -faa -fa a -fa a = b x i(a ,x ) 

-oo5 05 15 25 35 45 55 65 x^ 2 2 

2 3 4 5 6 _ 6^. 6 , 

a -fa -faa -faa -faa -faa -fa a -fa a = b x f(a ,x ) 

-oo6 06 18 26 36 46 56 66 x^ 2 2 

These are the relations that we are going to use to distinguish between different 
alphabet 


5 3 METHOD 1 IDENTIFICATION IN COEFFICIENT SPACE 

On observing the coefficient space for alphabet we find that in this space diagonal 

entries are mostly different for different alphabet Therefore our first method would be 

based on diaigonal entries In this method initially we are going to compare elements a 

and a Other elements i e a ,a ,a and a are conjugate elements of these two 
33 22 44 55 66 
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elements On the basis of this compeirison we get the following table In the table in those 
entries where more than one alphabet is present we compare a Table obtained after 


these three comparisons is 


Here 0 stands for 0, 1 stands for 1, 2 stemds for o, 3 stands for 4 stands for 
stands for where a is the primitive element of GF(23) 


cind 7 


11 


33 


Alphabet 


Alphabet (a ) 
10 


0 0 D,U,Z 

0 1 V 

0 3 p 

0 4 w 

0 6 h 

0 7c 

1 1 r 

1 3 a,x 

1 4 g.N 

1 6 1 

1 7 Q 

2 0 E 

2 1 1,4 

2 4 1,T,8,W,B,3 

2 5 q,J 

2 6 0 

3 1 G,2 

3 2 S 

3 5 H 

4 0 f 

4 2 e 

4 3 u 

4 4 j,n,7 

4 5 M 

4 6 C,F 

4 7 t 

5 0 b,5,6 

5 3 s,A 

5 5 k,K,R 

5 6 L,0 

6 2 X 

6 3 d,9,0 

6 6 m,V 

7 0 Y 

7 1 P 

7 5 I 

7 6 y,z 


D(7),U(4).Z(6) 




i(0),4(4) 

1(0),T(0),8(1),W(4),B(7),3(7) unidentified 

q(3),j(2) 

G(4).2(l) 


j(6).n(2).7(l) 

C(4),F(7) 


s(3) A(4 j ^ 

R(7),kh),K{7) 

L(7),0(4) 


umdentified 


d(2),0(6),9(3) 

m(4),V(5) 


y(7),z(o) 



After three comparisons only three cases are remaining where identification has not been 
completed In these three cases a is compared to finally identify eill the remaining 

-OO'OO 

alphabet Thus we observe that two to four comparisons of different coefficients is 
necessary to distinguish the edphabet in coefficient domain 
cilphabet requiring two comparisons = 21 
alphabet requiring three compeirisons = 34 
alphabet reqmring four comparison = 7 
Assuming equal probability of occurrence we would get 
Average number of comparison = (2x21 +3x34 +4x7)/62 
Therefore Average number of comparisons = 2 77 comparison 
As we have tried to have lesser compeirisons for more probable edphabet the real value 
would be lesser than 2 77 

5 4 METHOD 2 IDENTIFICATION IN COEFFICIENT SPACE 

The above method has one defect It is highly error prone For example If there is a 
change in coming alphabet at one position i e transmitted 0 is received as 1 then the 
coefficient space of this received alphabet can be entirely different from that of the 
treinsmitted alphabet In the above method each coefficient that we eire going to match 
depends on the whole real speice If we choose the coefficient to be matched such that it u 
depends only on one row or one column of real space, then chances of wrong identificatior 
are reduced If we take the weighted sum of fourth row and fourth column in coefficien 
space, then we can get numbers that eire only dependent on fourth column and fourth roi; 
of reed space respectively We have chosen the above rows for weighted addition becaus 
most of the information in sample domain is centered around them The relations wei 

explained in section 5 2 The pertinent relations have been written here again 

5 3 6 4 2 2 2 

cl = a +a +a a +a a +Qra +a a +« a +a a = L x f(x ,a ) 

2-» 20 21 22 23 24 25 26 1 1 

„ 5 3 6 4 2 2^ 2 ^ 

c2 = a +a +Qf a +0 a +Q'a +a a +a a +a a = L x ffa ,x ) 

-a>2 02 12 22 32 42 52 62 x 2 2 



Here cl and c2 axe 1 

the modified coefficients to be compared The table obtained after these 

two comparisons is 


Here 0 stands for 0, 

1 stands for 1,2 stainds for a, 3 stands for 4 stands for and 7 

stands for ofi where 

a IS the pnirutive element of GF(23) 

cl 

c2 

Alphabet 

1 

4 

c,o,f,g 

2 

2 

0 

2 

3 

E 

2 

4 

p,q,K,k 

2 

5 

Z,i 

2 

6 

d,9 

2 

7 

3,5,6,8,b,B,S,G 

3 

4 

r 

4 

0 

C.L 

4 

1 

J 

4 

2 

u 

4 

4 

v,y,D,0,Q,U.V 

5 

0 

s 

5 

2 

M,N 

5 

3 

F 

5 

4 

a,e 

5 

5 

X,4.7 

5 

6 

A 

5 

7 

P,h,z,R 

6 

2 

w 

6 

4 

J,6 

6 

5 

I.l.l.T 

6 

6 

n 

6 

7 

2 

7 

2 

W,m,x 

7 

5 

Y 

7 

7 

t 

After these 

two comparisons we compare weighted additions of different rows/column 

(depending 

on value of cl and c2) to identify still unidentified cases 


Alphabet requiring two comparisons = 13 


Alphabet requiring three compaiisons = 35 
Alphabet requiring four comparisons = 14 
Assuming equal probability of occurrence we would get 
Average number of comparisons= (2x13 -1-3x35 -l-4xl4)/62 
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Average number of comparisons = 3 01 comparisons 
Thus we observe that in this method the average number of comparisons has increased 
because we are comparing only a restricted portion of real space Also the complexity of 
this method is more than the previous method 


5 5 METHOD 3 IDENTIFICATION IN REAL SPACE 


In real space identification of alphabet, there are two methods First one is based on 
SIX to eight successive comparisons of different points and second on comparison of 
euiditions of rows and columns Second method has the advantage of error reduction 
because of the cancellation of two errors of opposite signs if they are occuring on the same 
row/column 

Second method Imtially we count the number of I's m the fourth row and fourth 
column as si zmd b 2 These two numbers are compared with their values stored in 
computer memory After these two comparisons m those entries where complete 
identification has not been done we match ewiditions of other rows/columns (depending on 
value of si and s2) 

8l=f(3,0)+f(3,l)+f(3,2)+f(3,3)+f(3,4)+f(3,5)+f(3,6)+f(3,7) 

s2=f(0,3)+f(l,3)+f(2,3)+f(3,3)+f(4,3)+f(5,3)+f(6,3)+f(7,3) 


We get the following table 
si 82 


1 

1 

1 

1 

1 

1 

2 

2 

2 

2 

3 

3 

3 

3 


1 

2 

3 

4 
6 
7 
1 
2 
3 
7 
1 
2 

3 

4 


eilphabet 

JX,L 

C,J,7 

s,a,2,z,Z 

Y 

1.1, T, I 

r,u,v,x,n,U,V 

c,y,o,p,q,D,0,Q 

e.g 

4 

M,N 

K,k 

w,0 

m 
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3 5 W 

5 2 F 

5 3 3,8,S,E,G 

5 6 t 

5 7 f 

6 1 h 

6 2 b,d,6,P,R 

6 3 5,B,9 

7 1 H 

7 2 A 


Alphabet requiring two comparisons =11 
Alphabet requiring three comparisons = 37 
Alphabet requiring four compeinsons =12 
Alphabet requiring five comparisons = 2 
Assunung equeil probability of occurrence we would get 
Average number of comparisons= (2x11 4-3x37 -f4xl2 -f5x2)/62 
Average number of comparisons = 3 08 comparisons 
5 6 PERFORMANCE COMPARISON 

Thus we observe that although number of comparisons required in this method is 
less than the number of comparisons required in successive comparison (Successive 
comparison of positions would have reqmred at least 6 comparisons) method, it is more 
than that of the first method On comparing the performance of three methods we find that 
least number of comparisons are required in first case but time wise third method is best as 
it does not require to calculate coefficients What would be the effect as the size increases'^ 
Let us assume that we have to identify all the possible shapes from a field of 

matrices of size nxn talcing veJues from GF(2) Then total number of input shapes possible 

2 

n 

IS 2 The coefficient space would talce values from GF(n) Let us assume that we are 
using 8 node transputer facility to calculate GT coefficients We also assume that we are 
using systolic array arrangement of processors As we know that number of fimte algebra 

3 

multiplications/additions reqmred to cadculate Galois coefficients is 2xn where the size o 



array is nxn As there are eight processors therefore, if time required for one fimte algebra 
aiddition is k then 

total time required for calculation of coefficients is = (2 x n^x k)/ 8 

2 

As number of possible shapes is 2 therefore in real space number of successive 

2 

comparisons reqmred to identify any shape is n Number of comparisons required in 

2 

coefficient space is (n /m ) where m=log n If time required for one comparison is p then 

2 

2 

Total time required in identification in real space = n x p 

2 

Total time reqmred in identification in coefficient space =(n xp)/m + (2xn^xk)/8 

3 

As second term (in second case) is increasing at a rate of n therefore time required for 
ceilculation dominates over time required for compaxison Thus for identification purposes 
real space is better But if we assume that we are also storing and transmitting data in 
coefficient form then identification in coefficient space would be better than identification 
in real space ceise 

What should be the ideal shapes of alphabet for quick identification‘s 
As we have seen that for identification of sixty two alphabet we should have required only 
2 comparisons (in coefficient space), but in our method we reqmre 2 77 comparisons This 
increase is due to the particuleir image/shape of alphabet If there had been a set of two 
coefficients which is different for every alphabet we would have identified eeich alphabet ir 
only two comparisons As there is no such set, so to obtain such a set we would have tc 
change coefficients (of that set) for some alphabet We can choose einy two coefficients in c 
set but we should also try to minimize chemges Therefore we will choose that set which ha 
maximum number of two comparison identification As change in ciny coefficient change 
the whole image therefore we would get a new group of images This group of shapes is nc 
umque Also this group may not work for identification m real space i e we may not get 
set of SIX positions whose comparison will give every alphabet 



CHAPTER 6 


SUMMARY AND CONCLUSIONS 


6 1 RESULTS 

We are going to compare the timings of different algorithms that were implemented 
on transputer feicility to calculate the Galois Transform coefficients Timings have been 
measured for different array sizes in sequential, parallel one node amd parallel eight node 
configurations 


Timings for different array size (systolic array) are given below 


ARRAY 

SIZE 

nxn 

SEQ 

execution time 

USING SYSTOLIC ARRAY 

PAR(1 node) PAR(8 node) 

Speedup 

4x4 

52 

57 

39 

133 

8x8 

49 3 

53 1 

17 1 

2 88 

16x16 

498 

547 

107 

4 65 

32x32 

5003 

5500 

845 

5 92 

64x64 

49 6s 

53 5s 

7 Is 

6 93 

128x128 

501s 

540s 

69s 

7 26 

256x256 

75 4 

80m 

10 Im 

7 46 


s denotes time in seconds and m denotes time in minutes 
All other timings are in milh seconds 


We note that timings under each heading are increasing by about 10 times when we 
increase the size of array by 4 ( i e when we double the n) Bulk of the time reqmred for 
calculating coefficients goes m fimte algebra additions As the number of finite algebra 
additions is directly proportioned to n"^ therefore when we increase the size by 2, then their 
nuumber increases by eight There is further increase m time due to increase in the number 
of finite algebra multiplications, thus jeickmg up the time by about 10 
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Timings for different array size FFT (linear arrangement) cure given below 


ARRAY 

SIZE 

SEQ 

execution time 

FFT(USING LINEAR ARRANGEMENT) 
PAR(1 node) PAR(8 node) 

Speedup 

4x4 

46 

69 

28 

1 64 

8x8 

460 

55 4 

14 1 

3 26 

16x16 

327 6 

382 4 

69 5 

4 71 

32x32 

4561 

5130 

645 

7 07 

64x64 

20 5s 

23 9s 

35s 

5 86 

128x128 

3958 

434s 

55 6s 

7 10 

256x256 

37 4m 

38 3m 

5 2m 

719 


s denotes time in seconds and m denotes time in minutes 
All other timings are in milli seconds 


Timings for different Mray size FFT (mesh arremgement) are given below 


ARRAY 

SIZE 

SEQ 

execution time 

FFT(USING MESH ARRANGEMENT) 

PAR(1 node) PAR(8 node) 

Speedup 

4x4 

46 

6 58 

2 7 

1 70 

8x8 

46 

52 8 

12 2 

3 77 

16x16 

327 6 

366 

67 2 

4 87 

32x32 

4561 

4926 

632 

7 21 

64x64 

20 5s 

22 6s 

3 3s 

6 21 

128x128 

395s 

423s 

54s 

7 31 

256x256 

28 5m 

30m 

3 8m 

7 51 


s denotes time in seconds and m denotes time m minutes 
All other timings are in milli seconds 


On comparing the timings of lineeir and mesh arrangement of processors we observe 
that later one has better execution time and speedup factor because of less commumcation 
overhead time In first and second case the average distance of each processor from root 
processor is 


((1+2+3T4+5+6+7+8)/8) = 4 processors and 
((14-2+2+2+3+3+3+3)/8) = 2 37 processors respectively 





Timings for different array size (Horner's rule) are given below 


execution time 

ARRAY USING HORNER'S RULE 


SIZE 

SEQ 

PAR(1 node) 

PAR(8 node) 

Speedup 

4x4 

39 

53 

28 

139 

8x8 

74 5 

79 8 

16 6 

4 48 

16x16 

1250 

1330 

180 

694 

32x32 

27 2s 

30 0s 

38s 

717 

64x64 

7 8m 

79m 

Im 

78 

128x128 

159 6m 

160m 

20 4m 

7 82 

266x256 


very large time 



8 denotes time in seconds and m denotes time m minutes 
All other timings are in milli seconds 


On Comparing the four set of timings we observe that FFT algorithms give the best 

tiimngs The timings have been reduced due to decrease in the number of computations As 

3 

compared to other two algorithms which require 2n multiplications and additions, FFT 

2 2 

algorithms reqmre 2 n (n^+n^+1) multiplications sind 2 n (n^+n^— 2) additions (where n^ 

and n eire factors of n) As the size increases the speedup factor improves because 
2 

commumcation time starts becoming insigmficant compared to calculation time The 
algorithm using Horner's rule has the best speedup factor but it also has the worst timings 
We have also implemented three algorithms for identification of alphabet On 
comparing the timings for matching different alphabet we observe that timings are either 
64 micro seconds or 128 micro seconds (i e 1 or 2 counts of internal clock) As we can not 
measure timings shorter than 64 micro seconds, therefore time comparison of these 
algorithms is not the suitable criterion for performance evaluation Therefore it would be 
better to compare the average number of matchings required for identification 

Average Number of comparisons required in method 1 = 2 77 comparisons 
Average Number of comparisons required in method 2 = 3 01 comparisons 
Average Number of comparisons required in method 3 = 3 08 comparisons 




6 2 SCOPE FOR FUTURE WORK 


In the introduction of this thesis we started with some questions about the ability of 
computer in identification problem These questions were 

(1) Gan computers be used for identification purposes'^ 

(2) Can a method/methods be devised to be used for differentiating between two 
images which are almost similar'^ 

(3) Is the above method dependent upon the type of image”? 

(4) Can this method operate in real time conditions'? 

(5) What axe the applications m which this method can be used”? 

(6) Whether the method is commercially viable'? 

Let's see whether we cein answer some of them now We have been using computers 
for image identification but m a very limited way i e we can not identify a person or a 
fingerprmt which is not in the computer memory Also we have to match each image with 
all the images present in the memory As the methods we have devised eiie also dependent 
on matching from memory therefore that hmitation remains As ultimately some form of 
matching is necessary for identification therefore that limitation will remain but we can 
reduce the dependence on memory in one way If we can identify some property of image 
which 18 similar in one type of image (for ex all fingerprints have vortex like structure) 
then we will have to match from a lesser number of images What we should try is to 
classify the images in different groups These groups cem be formed on the basis of Gedois 
transform also It may be that sill faces have some Galois coefficient/some tranform which 
takes only so and so value 

We are at advantage when we eire using Galois coefficients for identification of 
almost sirmlar images because in almost similar images also the Galois coefficients are 
entirely different The method we have developed is not dependent on type of image (if it's 
coefficients are stored in memory) but once we are able to classify the images then it would 
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be better to have different methods for different type of images 

The method that we have developed takes abouut 64 to 128 micro seconds for 
compmison and about 10 msec to cjilculate coefficients Therefore if we are able to reduce 
the time taken in calculation by using more and more nodes then this method could 
operate in real time Thus we have to develop paredlel programs which use more and more 
processors in efficient 3— dimensional networks 

As the purpose of method developed is to identify images therefore any application 
which requires image identification can use above method Special Application can be in 
those axeas where identification of two almost similar images is needed like fingerprint 
matching 
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76524450 

9 

01474667 
00476674 
6 1 7 1 5 5 6 2 
41264132 

3 4 5 6 1 3 3 2 
71135437 

2 6 7 2 5 3 1 2 

5 7 5 2 5 4 3 1 

C 

01474667 

11000000 

40466704 

70670474 

44176543 

60746670 

6 6 4 3 2 1 7 6 

7 7 2 1 7 6 5 4 

F 

11765432 
113 5 12 11 
7 7 4 5 2 7 6 1 
66674213 
5 5 1 2 6 4 3 5 

4 4 3 4 1 6 5 7 

3 3 5 6 3 1 7 2 

22715324 

I 

01645732 
0 1 3 5 7 2 4 6 
07760053 
06062450 
05475767 
04702403 
03444637 
02674662 
L 

11000000 

11765432 

77543217 

66321765 

5 5 1 7 6 5 4 3 
44654321 
33432176 
22217654 


7 

00765432 
11236574 
15411021 
12073111 
17304755 
13 10 16 15 
14043263 
16652027 
A 

0 1 3 5 7 2 4 6 
00000000 
43501724 
75623071 
67213260 
62046315 
74150527 
46334105 
D 

10474667 
10642753 
70014133 
6 0 1 0 5 1 5 7 
50450252 
4 0 1 1 2 0 6 2 
30355603 
20372230 
G 

01474667 

10761411 

44354664 

77454277 

4 1 3 7 1 2 0 5 
66376267 

6 1 4 5 3 2 1 0 

7 1 3 5 0 6 2 1 

J 

01761411 
0 1 3 5 5 2 3 2 
23224660 
35434307 
50405533 
52570567 
30032632 
20275052 
M 

10642753 

11765432 

73446764 

65674774 

5 7 1 0 5 2 3 0 
42646677 
34050132 
26315002 


8 

01474667 

01474667 

15206261 

12334014 

63704746 

13051577 

72047466 

45667047 

B 

10474667 

10474667 

70206261 

60334014 

50704746 

40051577 

30047466 

20667047 

E 

11765432 

11237546 

77227325 

66533326 

55030216 

44523545 

33257001 

22301540 

H 

10642753 

11765432 

73372230 

65355603 

57025244 

42450252 

34556036 

26307372 

K 

10526374 
11643725 
73534524 
65223577 
57575212 
42236365 
34455331 
2 6 3 2 1 6 3 2 
N 

10642753 
11765432 
7 4 1 1 2 0 6 6 
6 7 0 1 4 1 4 3 
54524350 
4 6 1 0 7 1 5 7 
36520363 
27522307 



0 

01474667 

10642753 

46014133 

74105157 

42450252 

67112062 

65355603 

73372230 

R 

10474667 
10 114 16 7 
70543462 
60724735 
50365304 
40665327 
30726230 
20650472 
U 

10642753 

01474667 

4 6 0 1 4 1 3 3 
7 4 1 0 5 1 5 7 
42450252 
67112062 
65355603 
73372230 

X 

10642753 

01357246 

63602044 

5 0 4 7 0 7 3 1 
27272647 
72006756 
54474556 
36437663 


P 

10474667 
11000000 
71704746 
61667047 
5 4 4 0 1 4 1 3 
41047466 
36062611 
2 7 7 7 1 0 5 1 
S 

01474667 

11765432 

30355603 

50450252 

54372230 

20372230 

36450252 

27355603 

V 

10642753 

01474667 

56633675 

24446525 

22466515 

37273724 

55733671 

33271424 

Y 

10642753 
1 0 2 3 1 5 1 1 
04773563 
07264655 
06700003 
06432427 
07002400 
04060050 


Q 

01474667 

11767446 

47162751 

76612413 

47227164 

64741153 

64516547 

76134376 

T 

11765432 

11765432 

07246135 

06135724 

05724613 

04613572 

03572461 

02461357 

W 

10642753 

01474667 

44237006 

77030546 

44644177 

66207540 

66614764 

77146767 

Z 

11765432 

01474667 

67031642 

46407531 

25310427 

74275016 

53164205 

32753160 
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